Scalable sentiment classification for Big Data analysis using Naïve Bayes Classifier
نویسندگان
چکیده
A typical method to obtain valuable information is to extract the sentiment or opinion from a message. Machine learning technologies are widely used in sentiment classification because of their ability to “learn” from the training dataset to predict or support decision making with relatively high accuracy. However, when the dataset is large, some algorithms might not scale up well. In this paper, we aim to evaluate the scalability of Naı̈ve Bayes classifier (NBC) in large datasets. Instead of using a standard library (e.g., Mahout), we implemented NBC to achieve fine-grain control of the analysis procedure. A Big Data analyzing system is also design for this study. The result is encouraging in that the accuracy of NBC is improved and approaches 82% when the dataset size increases. We have demonstrated that NBC is able to scale up to analyze the sentiment of millions movie reviews with increasing throughput. Keywords—Cloud computing, Big data, Polarity mining, sentiment classification
منابع مشابه
Scalable Sentiment Classification for Big Data Analysis Using Naı̈ve Bayes Classifier
A typical method to obtain valuable information is to extract the sentiment or opinion from a message. Machine learning technologies are widely used in sentiment classification because of their ability to “learn” from the training dataset to predict or support decision making with relatively high accuracy. However, when the dataset is large, some algorithms might not scale up well. In this pape...
متن کاملSentiment Analysis of Restaurant Reviews Using Hybrid Classification Method
The area of sentiment mining (also called sentiment extraction, opinion mining, opinion extraction, sentiment analysis, etc.) has seen a large increase in academic interest in the last few years. Researchers in the areas of natural language processing, data mining, machine learning, and others have tested a variety of methods of automating the sentiment analysis process. In this research work, ...
متن کاملSentiment Classification of Movie Reviews Using Hybrid Method
the area of sentiment mining (also called sentiment extraction, opinion mining, opinion extraction, sentiment analysis, etc.) has seen a large increase in academic interest in the last few years. Researchers in the areas of natural language processing, data mining, machine learning, and others have tested a variety of methods of automating the sentiment analysis process. In this research work, ...
متن کاملA Data Analytic Framework for Unstructured Text Hassanin
This paper describes a systematic flow of the unstructured data in industry, collected data, stored data, and the amount of data. Big data uses salable storage index and distributed approach to retrieve required information. Therefore, the paper introduces an unstructured data framework for managing and discovering using the 3Vs of big data: variety, velocity, and volume. Different approaches f...
متن کاملReview Paper on Sentiment Analysis of Twitter Data Using Text Mining and Hybrid Classification Approach
In Sentiment analysis we use natural language processing and information to extracting writer’s comments or reviews. In this paper we use Data text mining and hybrid approach of KNN Algorithm and Naïve Bayes Algorithm to find the sentiments of Indian people on Tweeter.
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2013